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1.         Introduction 

Sample  selection  models  provide  an  approach  to  correcting  for  nonrandom  sampling 
that  is  important  in  econometrics.      Pioneering  work  in  this  area  includes  Gronau  (1973) 
and  Heckman  (1974).      This  paper  is  about  two-step  estimation  of  these  models  without 
restricting  the  functional  form  of  the  selection  correction.     The  estimators  are 
particularly  simple,   using  polynomial  or  spline  approximations  to  correct  for  selection. 
Asymptotic  normality  and  consistency  of  an  asymptotic  variance  estimator  are  shown. 

Some  of  the  estimators  considered  here  are  similar  to  two-step  least  squares 
estimators  with  flexible  correction  terms  previously  proposed  by  Lee  (1982)  and  Heckman 
and  Robb  (1987).      The  theory  here  allows  the  functional  form  of  the  correction  to  be 
entirely  unknown,   with  the  number  of  approximating  functions  growing  with  the  sample 
size  to  achieve  /^-consistency  and  asymptotic  normality.     Also,   this  paper  adds  to  the 
menu  of  approximations  by  considering  new  types  of  power  series,   along  with  regression 
splines  that  are  important   in  statistical  approximation  theory  (e.g.   Stone,   1985). 

Early  work  on  semiparametric  estimation  of  sample  selection  models  includes 
Cosslett   (1991)   and  Gallant  and  Nychka  (1987).     These  papers  do  not  have  asymptotic 
normality  results.      Powell   (1987)  and  Ahn  and  Powell   (1993)  give  distribution  theory  for 
density  weighted  kernel  estimators.      The  series  estimators  analyzed  here  have  the  virtue 
of  being  extremely  easy  to  implement.      Also,   some  of  the  estimators  are  new,    including 
the  regression  splines.      Practical  experience  with  these  estimators  is  given  in  Newey, 
Powell,   and  Walker  (1990). 

Section  2  of  the  paper  presents  the  model  and  discusses  identification.      The 
estimators  are  described  in  Section  3,   and  Section  4  gives  the  asymptotic  theory. 
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2.        The  Model   and  Identification 


The  selection  model  model  considered  here   is 


(2.1)  y  =  x'/3     +  £,     y     only  observed  if     d  =  1,      d  e  {0,   1>. 


E[£|w,d=l]  =  E[£|v(w,a   ),d=l],     Prob(d  =  l|w)   =  7r(v(w,a   )),     x  £  w. 


Here  the  conditional  mean  of  the  disturbance,   given  selection  and     w,      depends  only  on 
the  index     v  =  v(w,a   ).     This  restriction  is  implied  by  other  familiar  conditions,   such 
as  independence  of  disturbances  and  regressors,   see  Powell   (1994).     A  basic  implication 
of  this  model   is  that 


(2.2)  E[y|w,d=l]  =  x'p     +  h   (v),     h   (v)  =  E[?|w,d=l] 


The  function     h   (v)     is  a  selection  correction  that  is  familiar.     For  example  if     d  = 
l(v+£;   i   0),      (£,£;)     is  independent  of     w,     £j     has  a  standard  normal  distribution,    and 
E[£|f]     is  linear  in     £,     then     h   (v)  =  0(v)/$(v),     where     $(v)     and     <p[v)     are  the 
standard  normal  CDF  and  p.d.f.   respectively.     This  term  is  the  correction  term  considered 
by  Heckman  (1976).      In  this  paper  we  allow     h   (v)     to  have  an  unknown  functional  form. 

Equation  (2.2)   is  an  additive  semiparametric  regression  like  that  considered  by 
Robinson  (1988),   except  that  the  variable     v  =  v(w,a   )     depends  on  unknown  parameters. 
Making  use  of  this  information  is  important  for  identification.      Ignoring  the  structure 
implied  by  equation  (2.1),    and  regarding     h        as  an  unknown  function  of  variables  in     w, 
would  mean  that  any  component  of     x     that  is  included  in  those  variables  would  not  be 
identified. 

The  identification  condition  for  this  paper  is 


Assumption  1:      M  =  E[d(x-E[x  |  v,d=l])(x-E[x|  v,d=l])'  ]     is  nonsingular,    i.e.   for  any     A  *  0 
there  is  no  measurable  function     f(v)     such  that     x'A  =  f(v)     when     d  =  1. 


This  condition  was  imposed  by  Cosslett   (1991),   and  is  the  selection  model  version  of 
Robinson's   (1988)   identification  condition  for  additive  semiparametric  regression.      As 
shown  by  Chamberlain  (1986),   this  condition  is  not  necessary  for  identification,   but  it 
is  necessary  for  existence  of  a  (regular)  Vn'-consistent  estimator.      It  is  important  to 
note  that  this  condition  does  not  allow  for  a  constant  term  in     x,     because  it   is  not 
separately  identified  from     h   (v). 

More  primitive  conditions  for  Assumption  1  are  available  in  some  cases.      A  simple 
sufficient  condition  is  that     Var(x)     is  nonsingular  and  the  conditional  distribution  of 
v     given     x     has  an  absolutely  continuous  component  with  conditional  density  that  is 
positive  on  the  entire  real  line  for  almost  all     x.      An  obvious  necessary  condition  is 
that     v     not  be  a  linear  combination  of     x,     requiring  that  something  in     v     be  excluded 
from     x.      Such  an  exclusion  restriction  is  implied  by  many  economic  models,   where     d     is 
a  choice  variable  and     v     includes  a  price  variable  for  another  choice. 

Identification  of     ji       from  equation  (2.2)  also  requires  identification  of     a    . 
Here  no  specific  assumptions  will  be  imposed,   in  order  to  allow  flexibility  in  the  choice 
of  an  estimator  of     a   .      Of  course,   consistency  of     a     will   imply  identification  of     a 
but  different  consistent  estimators     a     may  correspond  to  different   identifying 
assumptions.      For  brevity,   a  menu  of  different  assumptions  is  not  discussed  here. 


3.        Estimation 

The  type  of  estimator  we  consider  is  a  two-step  estimator,   where  the  first  step  is  a 
semiparametric  estimator     a     of  the  selection  parameters     a        and  the  second  step  is 
least  squares  regression  on     x     and  approximating  functions  of     v  =  v(x,a)     in  the 
selected  data.     These  estimators  are  analogous  to  Heckman's  (1976)  two-step  procedure  for 
the  Gaussian  disturbances  case.      The  difference  is  that     a     is  estimated  by  a 
distribution-free  method  rather  than  by  probit  and  a  nonparametric  approximation  to     h(v) 
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is  used  in  the  second  step  regression  rather  than  the  inverse  Mills  ratio. 

There  are  many  distribution  free  estimates  that  are  available  for  the  first  step, 
including  those  of  Manski   (1975),   Cosslett   (1983),   and  Ruud  (1986).     The  first  step  will 
need  to  be  /n-consistent,   like  the  estimator  of  Powell,   Stock,   and  Stoker  (1989), 
Ichimura  (1993),   and  Cavanagh  and  Sherman  (1997).      Also,   the  asymptotic  variance  of     /3 
will  be  an  increasing  function  of  the  asymptotic  variance  of     a,     so  an  efficient 
estimator  like  that  of  Klein  and  Spady  (1993)  may  be  useful. 

The  second  step  consists  of  a  linear  regression  of     y     on     x     and  functions  of     v 

that  can  approximate     h   (v).     To  describe  the  estimator  let     t(v,t))     denote  some  strictly 

monotonic  transformation  of     v,      depending  on  parameters     T).     This  transformation  is 

useful  for  adjusting  the  location  and  scale  of     v,     as  discussed  below.      Let     p    (x)  = 

(p     (x),...,p      (x))'      be  a  vector  of  functions  with  the  property  that  for  large     K 
IK  K.K. 

a  linear  combination  of     p    (x)     can  approximate  an  unknown  function  of     x.      Suppose  that 

the  data  are     z.   =  (d.,w.,d.y.),      (i  =  1,    ...,   n),      assumed  throughout  to  be  i.i.d..      Let 
liiri 

~~~,s  ~K- 

T)     denote  an  estimator  of     T),     v.   =  v(w.,a),     x.   =  x(v.,t)),      and     p.   =  p    (x.),      where  a 

K     superscript  for     p.     is  suppressed  for  notational  convenience.     For     x  = 

[d,x,,...,d  x   ]',     y  =  (d,y, d  y   )',     P  =  [d,P,,...,d  p   ]',     and     Q  =  P(P'P)_1P'      the 

11  nn  11  nn  11  nn 

estimator  is 

(3.1)  /3  =  M_1x'(I-Q)y/n,     M  =  x'  (I-Q)x/n, 


where  the  inverses  will  exist  in  large  samples  under  conditions  discussed  below.     The 
estimator     /3     is  the  coefficient  of     x.     from  the  regression  of     y.     on     x.     and     p.     in 
the  selected  data. 

This  estimator  depends  on  the  choice  of  approximating  functions  and  transformation. 
Here  we  consider  two  kinds  of  approximating  functions,   power  series  and  splines.      For 
power  series  the  approximating  functions  are  given  by 

(3.2)  pfx)   =  xk_1. 

kK 


Depending  on  the  transformation     x(v,t)),     this  power  series  can  lead  to  several 
different  types  of  sample  selection  corrections.     Three  examples  are  a  power  series  in 
the  index     v,      in  the  inverse  Mills  ratio     $(•)/$(•),     or  in  the  normal  CDF     $(•).      When 
a  nonlinear  transformation  of     v     is  used  (e.g.   for  a  power  series  in     $),    it  may  be 
appropriate  to  undo  a  location  and  scale  normalization  imposed  on  most  semiparametric 
estimators  of     v(w,a).      To  this  end  let     t)  =   (tj  ,t)   )'      be  the  coefficients  from 
probit  estimation  with  regressors     (l,v.),     where  we  do  not  impose  normality  (but  will 
require  that     77     be  a  vri-consi  stent  of  some  population  parameter).     Then  the  transformed 
observations  for  the  three  examples  will  be 


(3.3a)  x.   =  v., 

1  1 


(3.3b)  X.    =   0(TJ  +T)  v.)/f(T}  +T)2v.), 


(3.3c)  T.    =    $(T)    +7)    V  ) 


The  power  series  in  equation  (3.3a)  will  have  as  a  leading  term  the  index     v. 
itself.     The  one  from  equation   (3.3b)  will  have  leading  term  given  by  the  inverse 
Mills,   so  that  the  first  term  is  the  Heckman  (1976)  correction.     This  one  also  has 
approximating  functions  that  preserve  a  shape  property  of     h   (v)     that  holds  when     d  = 
l(v+^2;0)     and     (£,£)     are  independent  of     v,     that     h   (v)     goes  to  zero  as     v  gets  large. 
The  last  example  will  correspond  to  a  power  series  in  the  selection  probability  for 
Gaussian     £•. 

Replacing  power  series  by  corresponding  polynomials  that  are  orthogonal  with  respect 

to  some  weight  function  may  help  avoid  multicollinearity.      For  example,   for     x     = 

~    «  ~    „  k-1 

max.        ,     ,{t(v.,t))}     and     x„  =  min.        .     ,<t(v.,ti)>     one  could  replace     x  by  a 

l^n.d  =1         1  I  i^n,d  =1         1 

i  i 

polynomial  of  order     k     that  is  orthogonal  for  the  uniform  weight  on     [-1,1],      evaluated 

at     x.   =   [2x(v.,T))-x  -xj/(x  -x„).      Of  course,     6     is  not  affected  by  such  a 
1  1  u     I        u     I 

replacement,   since  it  is  just  a  nonsingular  linear  transformation  of  the  power  series. 
An  alternative  approximation  that  is  better  in  several  respects  than  power 
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series  is  splines,   that  are  piecewise  polynomials.      Splines  are  less  sensitive  to 
outliers  and  to  singularities  in  the  function  being  approximated.      Also,   as  discussed 
below,   asymptotic  normality  holds  under  weaker  conditions  for  splines  than  power  series. 
For  theoretical  convenience  attention  is  limited  to  splines  with  evenly  spaced  knots  on 
[-1,1].      For     b     =     Kb  >  0)«b,      a  spline     of  degree     m     in     x     with     L     evenly  spaced 
knots  on     [-1,1]     can  be  based  on 

(3.4)  PkK(x)  =  xk_1,      l  ~  k  ~  m+1> 

=  <[x  +  1  -  2(k-m-l)/(L+D]   )m,     m+2  ==  k  <  m+l+L  s  K. 

An  alternative,   equivalent  series  that  is  less  subject  to  multicollinearity  problems  is 
B-splines;   e.g.   see  Powell   (1981). 

Fixed,   evenly  spaced  knots  is  restrictive,   and  is  motivated  by  theoretical 
convenience.      Allowing  the  knots  to  be  estimated  may  improve  the  approximation,   but  would 
make  computation  more  difficult  and  require  substantial  modification  to  the  theory  of 
Section  4,   which  relies  on  linear  in  parameter  approximations. 

For  inference  it  is  important  to  have  a  consistent  estimator  of  the  asymptotic 

variance  of     /3.      This  can  be  formed  by  treating  the  approximation  as  if  were  exact  and 

using  formulae  for  parametric  two-step  estimators  such  as  those  of  Newey  (1984).      The 

estimator  will  depend  on  a  consistent  estimator     V(a)     of  the  asymptotic  variance  of 

v'n(a-a-).      Let     B     and     jr     be  the  estimates  from  the  regression  of     d.y.     on     d.x.     and 
0  1111 

d.p.,     e.   =  d.Cy.-x'.p-p'y)     the  corresponding  residual,   and     h(v)  =  p    (xtv.Tj))'  ■y     the 

estimate  of     h(v)     obtained  from  this  regression.      Define     u  =  (I-Q)x     to  be  the  matrix 

of  residuals  from  the  regression  of     d.x.     on     d.p.,      so  that     x' (I-Q)x  =  u'u     and  let 

1111 

(3.5)  V(/3)   =  firl[£.n1u.u/.(e.)2/n  +  HV(a)H'  ]M_1, 

"1=1   l    l     l 

H  =  y.niu.[ah(v.)/av]Sv(w.,a)/Sa'/n. 
"1=1    l  l  i 

This  estimator  is  the  sum  of  two  terms,   the  first  of  which  is  the  White  (1980) 
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specification  robust  variance  estimator  for  the  second  step  regression  and  the  second  a 
term  that  accounts  for  the  first-stage  estimation  of  the  parameters  of  the  selection 
equation.      It  can  also  be  interpreted  as  the  block  of  a  joint  variance  estimator  for     /3 
and     i     corresponding  to     (3,     where  the  joint  estimator  is  formed  as  in  Newey  (1984). 
This  estimator  will  be  consistent  for  the  asymptotic  variance  of     v/n(/3--3    )     under  the 
conditions  of  Section  4.      Note  here  the  normalization  by  the  total  sample  size     n     rather 

than  the  number  of  observations  in  the  selected  sample.      For  example,   a  95  percent 

„   „  1/2  «      ~   «  1/? 

asymptotic  confidence  interval  for     /3 .     is     [/3  -V(/3) . .   1.96/Vn,   |3.+V(/3)..   1.96/Vn]. 


4.        Asymptotic  Normality 

Some  regularity  conditions  will  be  used  to  show  consistency  and  asymptotic 
normality.     The  first  condition  is  about  the  first  stage  estimator. 

Assumption  2:     There  exists     i//(w,d)     such  that  for     i//.   =  i/»(w.,d.),     v'ntcc-a    )   = 

T.    ,i//./Vn  +  o   (1).      E[i/».]  =  0,      and     E[i//.i//'.  ]     exists  and  is  nonsingular.      Also,   for     V(a) 
/-1=1  i  p  l  l    l 

-£->  V(a)   =  E[0.0'.]. 
11 

This  condition  requires  that     a     be  asymptotically  equivalent  to  a  sample  average  that 
depends  only  on     w     and     d.      It   is  satisfied  by  many  semiparametric  estimators  of  binary 
choice  models,   such  as  that  of  Klein  and  Spady  (1993). 

The  next  condition  imposes  some  moment  conditions  on  the  second  stage. 

Assumption  3:      For  some     5  >  0,     E[dllxll        ]  <  oo,     Var(x|v,d=l)     is  bounded,   and  for     e  = 

2 
d(y-x'p   -h   (v)),   E[e    |v,d=l]     is  bounded. 

The  bounded  conditional  variance  assumptions  are  standard  in  the  literature,   and  will  not 
be  very  restrictive  here  because     v     will  also  be  assumed  to  be  bounded. 
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To  control  the  bias  of  the  estimator  is  essential  to  impose  some  smoothness 
conditions  on  functions  of     v. 


Assumption  4:      h   (v)     and     E[x|v,d=l]     are  continuously  differentiate  in     v,      of  orders 


s     and     t     respectively. 


We  also  require  that  the  transformation     x     satisfy  some  properties. 

Assumption  5:     There   is     77        with     v/n(T)-7)    )  =  0   (1),     the  distribution  of     x(v(w,an),T)    ) 
has  an  absolutely  continuous  component  with  p.d.f.   bounded  away  from  zero  on  its  support, 
which  is  compact.     Also,   the  first  and  second  partial  derivatives  of     v(w.,a)     and 
t(v,t))     with  respect  to     a,     v,      and     tj     are  bounded  for     a     and     T)     in  a  neighborhood  of 
a        and     T)        respectively. 


The  first  condition  of  this  assumption  means  that  the  density  of     t.      is  bounded  away 
from  zero,   which  is  useful  for  series  estimation,   but  is  restrictive.      For  example,    if     v 
=  x     +  x   ,      where     x       and     x       are  continuously  distributed  and  independent,   then  the 
density  of     v,     which  is  a  convolution  of  the  densities  of     x      and     x  ,     will  be 
everywhere  continuous,  and  hence  cannot  have  density  bounded  away  from  zero.     It  would  be 
useful  to  weaken  this  condition,   but  this  would  be  difficult  and  is  beyond  the  scope  of 
this  paper. 

The  next  assumption  imposes  growth  rate  conditions  for  the  number  of  approximating 
terms. 

Assumption  6:      K  =  K       such  that     VriK  "  — >  0     and     a)     p    (x)     is  a  power  series,     s 

7  K  4 

i  5,      and     K  /n  — >  0;     or     b)     p    (x)     is  a  spline  with     m  ^  t-1     s  ^  3,      and     K  /n  — >  0. 


Here,   splines  require  the  minimum  smoothness  conditions  and  the  least  stringent  growth 
rate  for  the  number  of  terms,   with     h   (v)     only  required  to  be  three  times  continuously 
differentiate.      It  is  also  of  note  that  this  assumption  does  not  required  under- 
smoothing.     The  presence  of     t     in  the  rate  conditions  means  that  smoothness  in 
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E[x|v,d=l]     can  compensate  for  lack  of  smoothness  in     h    (v),     so  that  the  bias  of     h(v) 

does  not  have  to  go  to  zero  faster  than  the  variance.     This  absence  of  an  undersmoothing 

requirement  is  a  feature  of  series  estimators  of  semiparametric  regression  models  that 

has  been  previously  noted  in  Donald  and  Newey  (1994). 

Asymptotic  normality  of  the  two-step  least  squares  estimator  and  consistency  of  the 

estimator  of  its  asymptotic  covariance  matrix  follow  from  the  previous  conditions.      Let 

u.   =  d.{x.-E[x.|v.,d.=l]},     Q  =  E[c2u.u'.  ],     and     H  =  E[u.{dh_(v.)/dv.}av(w.,aJ/a<x'  ]. 
111111  111  1       0     1         i  i     0 

Theorem  1:     If  Assumptions  1-6  are  satisfied  and     Q     is  nonsingular  then  for     V(£)   = 
M~2(n  +  HV(a)H')M~\     vnCp-^  -^  N(0,V((3)),     and     V(p)  -^  V(J3). 

This  result  gives  v^n-consistency  and  asymptotic  normality  of  the  series  estimators 
considered  in  this  paper,   that  are  useful  for  large  sample  inference.      It  would  also  be 
useful  to  have  a  way  of  choosing  the  number  of  functions  in  practice.     A     K     that 
minimizes  goodness  of  fit  criteria  for  the  selection  correction,   such  as  cross-validation 
on  the  equation  of  interest,   should  satisfy  the  rate  conditions  of  Assumption  6.     In 
Newey,   Powell  and  Walker   (1990)   such  a  criteria  was  used  and  gave  reasonable  results. 
However,  the  results  of  Donald  and  Newey  (1994)  and  Linton  (1995)  for  the  partially 
linear  model  suggests  that   it  may  be  optimal  for  estimation  of     |3     to  undersmooth, 
meaning     K     should  be  larger  than  the  minimum  of  a  goodness  of  fit  criteria.      Such 
results  are  beyond  the  scope  of  this  paper,   but  remain  an  important  topic  for  future 
research. 


-10- 


Appendix:   Proof  of  Theorem  1 


Throughout  the  Appendix     C     will  denote  a  positive  constant  that  can  be  different  in 

different  uses.      Also,   we  will  use  repeatedly  the  result  that  if     E[Y    |X   ]  — =-»  0     for  a 

sequence  of  positive  random  variables     Y       and  conditioning  sets     X   ,      then     Y     — =-*  0. 

To  begin  the  proof,    note  that  by     9v(w,oc)/9a     bounded  and  vri-consistency  of     a,      and  by 

St(v,ti)/Sv     bounded,     max.  |t.-t.  I    =  0   (1/vrT).     Also,   by  the  density  of     x.     bounded  away 

i      i      i  p  i 

from  zero,   both       min.x.     and     max.x.     will  be  Vn-consistent  for  the  boundary  points  of 

the  support  of     x.,     and  hence  so  will     min.x.     and     max.x..      Therefore,   by  a  location 
l  11  11 

and  scale  transformation  for  power  series,   which  will  not  change  the  regression,    it  can 

be  assumed  that      |x.|    £  1     and  max.|x.-x.|    =  0   (1/vri).      Now,    it  follows  from  Assumption 

l  l      l      l  p 

1/2 
6,   as  in  Newey   (1997)  that  for      HAH   =  tr(A'A)       ,     there  is  a  nonsingular  linear 

~K  K 

transformation  of     p    (x)     of     p    (x)     such  that 

(A.l)  E[d.pK(x.)pK(x.)']  =  I,     sup,     .      lldSpK(x)/dxSll   s  C,  (K), 

111  |x  |  si  ^s 

^1(K)K1/2/vrT  ->  0,     q(K)K~S+1  ->  0, 

C,  (K)  =  CK  for  splines,     ^  (K)  =  CK  for  power  series. 


Since  a  nonsingular  transformation  does  not  change     p,      it  will  be  convenient  to  just  let 
pK  =  pK.      Then,    as  in  Newey  (1997),      HP'P/n  -  III    =  0   (C0(K)K1/2/vrT)  -^  0.      Also,   by  the 


mean  value  theorem,      max.  IIP. -P.  II   <  C(K)max.  |x.-x.  |    =  O   (C,(K)/vrT),      so  that      HP'P/n 

1  pi 

)   ( 

P 


P 
,2  ,  A   „..  ,  ^    ,>.  ,„,2  ,  .,1/2. 


P'P/nll   s   HP-PII/n  +   IIPIMIP-PII/n  =  0  (C(K)  /n  +  K      CAK)/VR)  -?-*  0.     Hence,   by  the 

pi  1  J 


triangle  inequality, 


(A. 2)  HP'P/n  -  III   -^  0. 


It  follows,   as  in  Newey  (1997),   that     A(P'P/n)  a  C     with  probability  approaching  one, 
where     A(A)     denotes  the  smallest  eigenvalue  of  a  symmetric  matrix     A. 

Next,   since     x(v,7j    )     is  one-to-one,   conditioning  on     v     is  equivalent  to 
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conditioning  on     x,      so  that,   for  example,   h   (v)     can  be  regarded  as  a  function  of     x. 

-  2 

Let     n.   =  d.E[x.  |x.,d.=l],     u  =  [jjl  ,...,jjl   ],     and     ju  =  Qx.      So  that     II(j-/jII   /n  = 

tr(x'Qx-2x'Qfi+fi'(j)/n.     By     A(F"P/n)   ^  C,     Q     idempotent,     and  existence  of  the  second 

moment  of     x.,     for     A  =  P(P'P)"1 

1 

llx'AII2  =  tr(x'AA'x)  <  0  (l)tr(x' Qx/n)  ^  0  (l)tr(x'x/n)  =  O  (1). 

P  P  P 

It  follows  similarly  that     llx'AII   =  0  (1)     for     A  =  P(P'P)"1.     Also,      llx'  (P-P)/nll   £ 

P 

llxll  IIP-PH/n  =  0   (C(K)/Vn)  -^     so  that  for     Q  =  P(P'P)_1P', 
P     1 

(A. 3)  llx'  Qx/n  -  x'Qx/nll   <   llx' (P-P)A'x/nll    +   llx' A(P' P-P' P)A' x/nll   +    llx' A(P-P)' x/nl 

<   llx'(P-P)/nll(IIA'xll  +  IIA'xll)   +   llx'AIIII(P'P-P'P)/nllllA'xll   -^  0. 


It  follows  similarly  that     x'Qu/n  -  x'Q]Li/n  — ^>  0.     Therefore, 

(A. 4)  ll/j-fill2/n  =  tr(x'Qx-2x'Qfi+/j'fi)/n  +  o   (1)  =  tr(u'  Qu+n'  (I-Q)]j)/n  +  o   (1). 


For  T  =  (x ,x  )'   and  D  =  (d,,...,d  )',  by  independence  of  the  observations, 

In  1  n 

E[u.|T,D]  =  E[d.(x.-E[x.|x.,d.=l])|x.,d.]  =  0.     Therefore,     E[u.u.|T,D]  = 
l  1111111  i  j 

E[u.u.  |x.,x  .,d.,d  .]  =  E[u.E[u  .|u.,x.,x  .,d.,d  .]  |x.,x  .,d.,d.]  = 
i  J      i     J     i     J  i       J      i     i     J     i     J       i     J     i     J 

E[u.E[u.|x  .,d.]  |x.,x  .,d.,d .]  =  0.      Also,   by  Assumption  3,   E[u'.u.|T,D]  =  E[u'.u.  |x.,d.]  £ 
ijjjijij  ii  1111 

C.     Therefore,   with  probability  one, 
(A.5)  E[uu'  |T,D]  <  CI. 


It  follows  that     E[tr(u'Qu)/n|T,D]  <  Ctr(Q)/n  =  CK/n  -h>  0,     so  that     tr(u'Qu)/n  -^  0. 

Also,   by  Assumption  4  and  standard  approximation  theory  results  for  power  series  and 

splines  (e.g.   see  Newey,    1997  for  references),   and  by     (I-Q)P  =  0     and     I-Q  idempotent, 

there  exists  TI..     such  that  E[tr((j'  (I-Q)(j)]/n  =  E[tr((fi-PIT'  )'  (I-Q)(fi-PIT'  ))]/n  £ 
is.  K.  K. 

E[tr((/n-PIT')'(M-Pn'))]/n  =  E[d.{/i.-nvpK(x.)}' {/_t.-TT   pK(x.)}]  -^  0.      Combining  these 
is  is  iiisiiK.1 

results  with  equation  (A. 4)  gives 
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(A. 6)  ll/J-/j|l2/n  -£-»  0. 


This  implies  that     M  -  u'u/n  — >  0,      while     u'u/n  — =>  M     follows  by  the  law  of  large 

numbers.     The  triangle  inequality  then  gives     M  — >  M. 

Next,    let     e  =  (e, ,...,£    )',      and     W  =  [w'   ...,w']'.      It  follows  similarly  to  eq. 
In  In 

(A. 5)  that     E[ee'  |W,D]   £  CI.     Then,   since     Q     and     Q     are  functions  of     W     and     D, 

E[llx'  (Q-Q)e/Vnll2|W,D]  =  tr{x'(Q-Q)E[ee'  |  W,D](Q-Q)x}/n  ==  Ctr(x'  (Q-Q)(Q-Q)x)/n. 


It  follows  similarly  to  equation   (A. 3)  that     x'  (Q-Q)Qx/n  -^  0     and     x' (Q-Q)Qx/n  -2-»  0, 

so  that   llx'tQ-Qje/Vnll   -£-*  0,     and  hence     x'  (I-Q)e/Vn  =  x' (I-Q)e/Vn  +  o  (1).      It  follows 

P 

as  in  Donald  and  Newey  (1994)  that 

(A. 7)  x'(I-Q)e/Vn  =  u' c/Vn  +  o   (1). 

P 


For  both  power  series  and  splines  it  follows  as  in  Newey  (1997)  that  there  are     y        and 

K. 

K  K 

tt        such  that  for  h    (t)   =  p    (x)'y      and     (J    (x)   =  p    (x)'ti    , 

(A.8)  sup  |h0(x)-hK(x)|    <  CK~S+1,     sup  |dhQ(x)/dx-dhK(x)/dx|    ==  CK~S+1, 

sup  i  x  I  ^j_  I  M-CtJ-Mj^Cx)  1    £  CK     . 


Let     h.   =  h(x.),     h.  =  h(x.),     hK.   =  hK(x.),     hK.   =  hK(x.),      n.  =  jxh:.),     MRi  =  ^(xJ, 
H       =  ^    (x.),     and  let  expressions  without  the     i     subscript  denote  corresponding 
matrices  over  all  observations  multiplied  by  selection  indicators,   e.g.     n     = 

[dJL,,,...,d  jL.  ]'.     Then     x'  (I-Q)lWn  =  x' (I-Q)(h-h)/Vn  +  (x-jl,)'  (I-QHh-Lj/Vn.     Let     8 

i   K.1  n  j\.n  K.  K. 

=  (a', 77')',     x(w,e)   =  x(v(w,a),77),      and     h.   =  Sh(x(w.,en))/5e' .      Since     ax(w,en)/3T) 

01  1     U  0 

depends  only  on     v     and     E[u.a(v.)]  =  0     for  any  function     a(v.)     with  finite 

mean-square,     E[u.li    .]  =  E[u.{dh.(v.)/dv}3v(w.,a    )/3a' ,0]  =   [H,0].    It  follows  similarly 
l  0i  l       0     i  i     0 

to     M  — — »  M     that     x'(I-Q)h  /n  — ^-»  E[u.h   '.  ].     Then  by  a  second-order  expansion  and 
v/n-consistency  of     0, 
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(A. 9)  x'U-QMh-hWn  =  -[x'  (I-Q)ho/n]v^(0-e.)  +  o  (1)  =  -E[u.h   .Wn(9-eJ   +  o  (1) 

9  Op  i    8i  Op 

=  HVn(a-0  +  o   (1). 
0  p 

Also,   by  eq.    (A.8)   and     I-Q     idempotent,      (£-£,. )' (I-QHh-fuJ/Vn  =  0   (v/nK-5-1"1"1)  -£->  o. 

IS.  R  p 

Also,     lii-nY  [I-QUh-hv)/Vn  =  0  (K~S+1)  -B->  0     and     u'  [l-Q)(h-hv-h+hT.)/Vn  =  0   (K_S+1) 

is.  p  ix  K  p 

-^->  O.      Also,    it  follows  similarly  to  eq.    (A. 4)  that  for     e      =  h-h.,,     u' (Q-Q)e^/Vn  = 

K  is.  K. 

0   (<T(K)K~S+1)  -H->  0.      Also,   E[llu'Q<=    ll2/n|T,D]   =  e'QE[uu'  |T,D]Qe.,/n  £  Ce'Qe^/n  ^  e'e^/n 
pi  K.  is.  K.K.K.K.K. 

— >  0,     so  that     u'Qe^/Vn  — >  0.     The  triangle  inequality  then  gives 
(A. 10)  x'(I-Q)h/Vn  =  (x-£    )'  (I-Q)(fi-fL)/V3  -^  0. 


Combining  equations  (A. 7),    (A. 9),   and  (A. 10),   we  obtain 

(A. 11)  x'(I-Q)(e+h)/Vn  =  u'  c/Vn  +  HVn(a-a_)  +  o   (1)   =  V.n,(u.e.   +  Hip.)/Vn  +  o   (1). 

0  p  ^i=l     ill  p 

The  first  conclusion  then  follows  from  the  Lindberg-Levy  central  limit  theorem  and 

E[u. c.dj'.  ]  =  E[u.E[e.|w.,d.]i/»'.  ]  =  0. 
11111111 

To  show  the  second  conclusion,   note  that     dh(v.)/dv  =  [dh(x.)/dx]dx(v.,7i)/dv,   and  it 

l  l  l 

follows  from  the  Assumption  5  that     sup.      |  dx(v.,T})/dv-dx(v.,T))/dv|    =  0   (1/Vn).      Also, 

h(x)  =  pK(t)'£,      i  =  A'(y-xjS).      Similarly  to  eq.    (A. 3),    IIA'x(J3-|3   )ll   £  0   (l)llx(/3-0    Willi   = 

0   (1/Vn),      II A'  (h-fi)  11    <  0   (l)ll(h-h)//nll   =  0   (1/Vn),     and      IIA'  (h-P-y^)ll    <  0   (1)  II  (h-P>.,  Willi 
P  P  P  K  p  K 

-s+1  -  2 

=  O   (K  "     ).      Similarly  to  previous  results,     E[e'Qe|D,W]  ^  CK,      so  that      HA' ell      =  e'AA'e 

=  0   (De'Qc/n  =  0   (K/n).      Then  by     y-y      =  A'x(p-|3n)   +  A' e  +  A' (h-h)   +  A'(h-P^)   and  the 
p  p  K.  0  K. 

triangle  inequality,      lly-3f„ll    =  0   ((K/n)       )  +  0   (K         ).     Then  for     s  =  1     or     2, 

K.  p  p 

(A. 12)  SUP|T|£l|dSh(T)/dTS-dSh0(T)/dTS|      <    SUP|T|£l|[dSpK(T)/dTS]'(y-rK)l 

+  sup|T|£l|dS[pK(T)'3-K]/dTS-  dSh0(x)/dxS|    <  Cs(K)lly-yKll   +  0(K-S+1) 

=  0   (C  (K)[(K/n)1/2+K_S+1l)  =  o   (1). 
P     s  p 
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It  follows  that     max.      |  dh(x.)/dx-dh,Jx.)/dx  |    -^-»  0.      Also,   since  the  conditions  require 

i£n  1  0     1 

that     h   (x)     be  at  least  twice  differentiate  with  bounded  derivative, 

max.      |dh„(T.)/dT-dh_(T.)/dx|    -?->  0,      implying     max.      I  dh(v.)/dv-dh^(v.)/dv  |   -^  0. 
i<n        0     l  0     i  i^n  l  0     l 

Then,   by  boundedness  of     9v(w.,a)/da,     for     H  =  n    Y.    ,u.[5x(w.,a)/aa'  ]dh„(v.)/dv, 
J  l  ^1=1    11  0     l 

IIH-HII   ^  tr(u'u/n)1/2(r.niliaT(w.,a)/aall2/n)1/2max.      |  dh^(x.)/dT-dh^(x.)/dT  |    -^  0. 

^i=l  l  i^n        0     l  0     l 


It 


also  follows  by  eq.    (A. 3)  that  for     H  =  n    Y.    ,u.[aT(w.,a^)/aa'  ]dhjv.)/dv,      IIH-HI 

^i=l  l  l     0  0     l 


0.      Then  since     H  — !— >  H     by  the  law  of  large  numbers,      H  — ^->  H     follows  by  the  triangle 

inequality. 

Now,    let     A.   =  x'.  (/3-/3J  +  h.-h..      By  eq.    (A.12),     max.       |h.-h.|    < 
l  l  0  11  i^n     l     i 

max.      |h(x.)-h.(x.)|    +  max.      |  h_(x.)-h.(x.)  |   -^  0.     Also,     max.      |x'.  (/3-/3J I    < 
l^n         i       0     l  l^n     0     l       0     l  i^n      l  0 


max.      Nx.l 
i^n      l 


,S  0    .    ^     1/(2+5). „n  „      ..2+5.   .1/(2+5)^   ..  . n.  1/(2+5),,    mri   ,, , n.     P     n 

18-R    II   <  n  ().    ,  llx.ll        /n)  0   (1/vn)  =  n  0   (1)0   (1/vn)  — ^  0. 

0  ^i=l      l  p  P        P 


Then  by  the  triangle   inequality     max.^    |A.|    — >  0.      Furthermore,   by  Assumption  3, 

Etle.  ||W,D]   £  C,      so  that     E[£."  Ilu.ll2|  e.  |/n  |  W,D]   =  £."  llu.ll2E[  |  c.  |  |  W,D]/n  <  C£.|\  llu.ll2/n 

=  0   (1),      and  hence     Y.n,  llu.ll2  |  e.  |/n  =  0   (1).      Therefore, 
p  ^i=l      l  l  p 

(A.13)  \\Yn,u.u'.£2/n  -  £.niu.u,.e2/nll   ^  vn  llG.II2|e2-e2|/n  =  V.n,  llu.ll2  |  (e.-A.)2-e2  |/n 

^i=l  l    l    i  ^i=l  i    i    i  ^1=1     i  i      i  ^i=l     i  ill 

2(J].n1llu.ll2|e.|/n)max.      I  A.  |    +  (Y.n  ,llu.ll2/n)max.      |a.|2-?h>0. 
^i=l      l  l  i^n      i  ^i=l     l  i^n      l 


Also,   note  that     EIY."  Ile.u.-e.u.ll2/n  |W,D]  =  E[Y.n,c2\\ii.-^.\\Z/n  |  W,D]  < 
^1=1      11      11  ^i=l    l      l     l 

^.^Ele^lW.Dlll/l.-iJ.I^/n  £  C£."llji.-fi.ll2/n  A  0     by  equation  (A. 6).      Therefore, 

y.n,lle.u.-e.u.ll   /n  -^  0.      It  follows  that     r.n,u.u'.c2/n  -  Y.n,u.u'.e2/n  -^  0.      Then  by 
^i=l      11      11  ^i=l   l    l    l  ^i=l   ill  J 

the  law  of  large  numbers,     Y.    ,u.u'. e./n  -^-»  Etu.u'.c]  =  Q,      so  by  the  triangle 

■^1=1   ill  ill 

inequality,     £._  u.u'.e./n  — >  Q.     The  second  conclusion  then  follows  by  consistency  of 
V(a)     and  the  Slutzky  theorem. 


-15- 


References 

Ahn,   H.   and  J.L.    Powell   (1993):    "Semiparametric  Estimation  of  Censored  Selection  Models 
with  a  Nonparametric  Selection  Mechanism,"  Journal   of  Econometrics  58,   3-29. 

Chamberlain,   G.    (1986):    "Asymptotic  Efficiency  in  Semiparametric  Models  with  Censoring," 
Journal   of  Econometrics  32,    189-218. 

Cosslett,   S.R.    (1983):    "Distribution-Free  Maximum  Likelihood  Estimator  of  the  Binary 
Choice  Model,"  Econometrica  51,   765-782. 

Cosslett,   S.R.    (1991):    "Distribution-Free  Estimator  of  a  Regression  Model  With  Sample 
Selectivity,"  in  W.A.   Barnett,   J.L.   Powell  and  G.   Tauchen,   eds.,   Nonparametric  and 
Semiparametric  Methods  in  Econometrics  and  Statistics.  Cambridge,   Cambridge  University 
Press. 

Donald,   S.G.   and  W.   Newey   (1994):    "Series  Estimation  of  Semilinear  Models,"  Journal   of 
Multivariate  Analysis  50,   30-40. 

Gallant,   A.R.   and  D.W.   Nychka  (1987):    "Semi-nonparametric  Maximum  Likelihood  Estimation," 
Econometrica  55,   363-390. 

Gronau,   R.    (1973):    "The  Effects  of  Children  on  the  Housewife's  Value  of  Time,"  Journal   of 
Political  Economy  81,   S168-S199. 

Heckman,   J.J.    (1974):    "Shadow  Prices,   Market  Wages,   and  Labor  Supply,"  Econometrica  42, 
679-693. 

Heckman,   J.J.    (1976):   "The  Common  Structure  of  Statistical  Models  of  Truncation,   Sample 
Selection  and  Limited  Dependent  Vairables  and  a  Simple  Estimator  for  Such  Models," 
Annals  of  Economic  and  Social   Measurement  5,   475-492. 

Heckman,   J.J.   and  R.    Robb  (1987):    "Alternative  Mehods  for  Evaluating  the  Impact  of 

Interventions,"  Ch.    4  of  Longitudinal   Analysis  of  Labor  Market  Data,   J.J.   Heckman  and 
B.    Singer  eds.,   Cambridge,   UK:   Cambridge  University  Press. 

Ichimura,   H.    (1993).   Estimation  of  single  index  models.   Journal   of  Econometrics  58, 
71-120. 

Klein,   R.W.   and  R.S.    Spady  (1993):    "An  Efficient  Semiparametric  Estimator  for  Discrete 
Choice  Models,"  Econometrica  61,   387-421. 

Lee,   L.F.    (1982):    "Some  Approaches  to  the  Correction  of  Selectivity  Bias,"  Review  of 
Economic  Studies  49,   355-372. 

Linton,   0.    (1995):    "Second  Order  Approximation  in  a  Partially  Linear  Regression  Model," 
Econometrica  63,   1079-1112. 

Manski,   C.    (1975):    "Maximum  Score  Estimation  of  the  Stochastic  Utility  Model  of  Choice," 
Journal   of  Econometrics  3,   205-228. 

Newey,   W.K.    (1997):    "Convergence  Rates  and  Asymptotic  Normality  for  Series  Estimators," 
Journal   of  Econometrics  79,   147-168. 

Newey,   W.K.   and  J.L.   Powell   (1993):    "Efficiency  Bounds  for  Semiparametric  Selection 
Models,"  Journal   of  Econometrics  58,   169-184. 

-16- 


Newey,   W.K.,    J.L.   Powell,    and  J.R.   Walker  (1990):    "Semiparametric  Estimation  of  Selection 
Models:   Some  Empirical  Results,"  American  Economic  Review  Papers  and  Proceedings,   May. 

Powell,   J.L.    (1994):    "Estimation  of  Semiparametric  Models,"  in  R.F.    Engle  and  D. 
McFadden,   eds.,   Handbook  of  Econometrics:  Volume  4,   New  York:   North-Holland. 

Powell,   J.L.,   J.H.   Stock,   and  T.M.    Stoker  (1989).   Semiparametric  Estimation  of  Index 
Coefficients  Econometrica  57,    1403-1430. 

Powell,   J.L.    (1987):    "Semiparametric  Estimation  of  Bivariate  Limited  Dependent  Variable 
Models,"  manuscript,    University  of  California,   Berkeley. 

Powell,   M.J.D.    (1981):   Approximation  Theory  and  Methods,   Cambridge,   UK,   Cambridge 
University  Press. 

Robinson,   P.    (1988):    "Root-N-Consistent  Semiparametric  Regression,"  Econometrica  56, 
931-954. 

Ruud,   P. A.    (1986):    "Consistent  Estimation  of  Limited  Dependent  Variable  Models  Despite 
Misspecification  of  Distribution,"  Journal   of  Econometrics  32,    157-187. 

Stone,   C.J.    (1985):    "Additive  Regression  and  Other  Nonparametric  Models,   Annals  of 
Statistics  13,    689-705. 

White,   H.    (1980):    "Using  Least  Squares  to  Approximate  Unknown  Regression  Functions," 
International   Economic  Review  21,   149-170. 


70  7  6    DO 8 

-17- 


Date  Due 


MIT  LIBRARIES 


3  9080  01972  1015 


ISWWW8 


wmm^mi 


memili  m 


]:^WM 


■4  m"v 


^SmSMM 


ifmwm 


Hi 


^¥iiiiii 


liiiiiiiiiiiii 


in 


iiiiaiii 


l;!;ll 


"■.  ;,v;'': 


:::;;:: 


Ifiltf 


■WMilllgi&i-%^ 


Sliiip'i;® 


31 


ftp 

■:::;;:r;:j:^;.:;.•/,:l/;,^;/i:(;;:^;i;;,::;'^^h:;^^^;::^;;,v^:^M!;;;:;i:;: 


in 


)nmihii^tm::iBmmmma^mMmt 


I:. 


was 


;?■>;■: 


mn 


m 


MM; 


'if. 


::■;;.■, :,.,:-  .■'.,-:■■■■.,  ■■;.■,,:. 


illlilliiil 

jmgmam^ 


llllililllllllllSIIS 


